Skip to content

fix(retrieval): surface silent IPNI timeouts in logs#539

Merged
SgtPooki merged 1 commit into
mainfrom
feat/retrieval-ipni-observability
May 19, 2026
Merged

fix(retrieval): surface silent IPNI timeouts in logs#539
SgtPooki merged 1 commit into
mainfrom
feat/retrieval-ipni-observability

Conversation

@SgtPooki
Copy link
Copy Markdown
Collaborator

What changed

Two adjacent observability bugs left IPNI failures invisible whenever the outer pg-boss retrieval-job timeout fired:

  1. `IpniVerificationService.verify` check-order. Catch checked `signal?.aborted` before `verificationSignal.aborted`. On a race where both signals had asserted by the time the catch handler ran, `signal.throwIfAborted()` re-threw before the `ipni_verification_timed_out` log could fire. The inner-timeout log path was effectively dead code whenever outer ≤ inner. Inner check now runs first so the log fires whenever the inner timer aborted.

  2. `RetrievalService.verifyIpniForRetrieval` silent abort. The catch's signal-aborted branch returned a silent `failure.timedout` with no event log. Added `retrieval_ipni_verification_timed_out` so the retrieval-side caller records the abort.

Implements option C from #524. Stacks on #538.

Files

  • `apps/backend/src/ipni/ipni-verification.service.ts` — swap catch order so inner-timeout check wins same-tick races.
  • `apps/backend/src/retrieval/retrieval.service.ts` — add `retrieval_ipni_verification_timed_out` log in the abort branch.
  • `apps/backend/src/ipni/ipni-verification.service.spec.ts` — new test verifying `ipni_verification_timed_out` fires when both signals abort in the same tick.
  • `apps/backend/src/retrieval/retrieval.service.spec.ts` — new test verifying `retrieval_ipni_verification_timed_out` fires when outer signal aborts and IPNI verify throws.

How to verify

```
pnpm --filter dealbot-backend typecheck
pnpm --filter dealbot-backend lint
pnpm --filter dealbot-backend test
```

After deploy, staging logs should show `ipni_verification_timed_out` events 1:1 with `retrievalStatus=failure.timedout` deltas for SPs without IPNI advertisements (currently SP 5 Mongo2Stor, SP 17 superusey-calib in staging).

Notes

Base branch is `feat/retrieval-parallel-transport-counter` (#538). Re-target to `main` after that PR merges.

@FilOzzy FilOzzy added this to FOC May 14, 2026
@github-project-automation github-project-automation Bot moved this to 📌 Triage in FOC May 14, 2026
@SgtPooki SgtPooki self-assigned this May 14, 2026
@SgtPooki SgtPooki force-pushed the feat/retrieval-ipni-observability branch from a9c9f5a to 249fcc1 Compare May 14, 2026 18:48
@BigLep BigLep moved this from 📌 Triage to 🔎 Awaiting review in FOC May 15, 2026
@BigLep BigLep requested review from Copilot and silent-cipher May 15, 2026 05:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Two small observability fixes for IPNI verification timeouts in the retrieval check. The first reorders the catch handler in IpniVerificationService.verify so the inner-timeout log fires even when the outer job signal aborts in the same tick. The second adds a retrieval_ipni_verification_timed_out warn log in RetrievalService.verifyIpniForRetrieval's outer-abort branch, replacing a silent failure.timedout return. Both changes have accompanying unit tests.

Changes:

  • Swap catch-order in ipni-verification.service.ts so timeoutSignal.aborted is checked before the external signal?.aborted rethrow.
  • Emit retrieval_ipni_verification_timed_out log in retrieval.service.ts when the outer signal aborts during IPNI verify.
  • Add unit tests for both code paths.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
apps/backend/src/ipni/ipni-verification.service.ts Reorder catch checks so inner-timeout log wins same-tick races.
apps/backend/src/ipni/ipni-verification.service.spec.ts Test that ipni_verification_timed_out fires when both signals abort in the same tick.
apps/backend/src/retrieval/retrieval.service.ts Add structured warn log on outer-signal abort in verifyIpniForRetrieval.
apps/backend/src/retrieval/retrieval.service.spec.ts Test that retrieval_ipni_verification_timed_out fires when outer signal aborts and IPNI verify throws.

Copy link
Copy Markdown
Collaborator

@silent-cipher silent-cipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me aside from the code quality ci failure.

@github-project-automation github-project-automation Bot moved this from 🔎 Awaiting review to ✔️ Approved by reviewer in FOC May 15, 2026
@SgtPooki SgtPooki force-pushed the feat/retrieval-ipni-observability branch from 249fcc1 to 041e64e Compare May 18, 2026 15:11
Base automatically changed from feat/retrieval-parallel-transport-counter to main May 19, 2026 12:38
Two adjacent observability bugs left IPNI failures invisible whenever
the outer pg-boss retrieval-job timeout fired:

1. IpniVerificationService.verify checked signal?.aborted before
   verificationSignal.aborted. On a race where both signals were
   asserted by the time the catch handler ran, signal.throwIfAborted()
   re-threw before the ipni_verification_timed_out log could fire.
   Inner check now runs first so the inner timeout log fires whenever
   the inner timeout signal aborted, regardless of outer state.

2. RetrievalService.verifyIpniForRetrieval's catch returned a silent
   failure.timedout when signal?.aborted, with no log. Added
   retrieval_ipni_verification_timed_out so the retrieval-side caller
   records the abort.

Tracks dealbot#524 option C. Pairs with #538 (parallel IPNI + transport).
@SgtPooki SgtPooki force-pushed the feat/retrieval-ipni-observability branch from 041e64e to ef9b48f Compare May 19, 2026 13:51
@SgtPooki SgtPooki merged commit 1ec72a2 into main May 19, 2026
7 checks passed
@SgtPooki SgtPooki deleted the feat/retrieval-ipni-observability branch May 19, 2026 13:54
@github-project-automation github-project-automation Bot moved this from ✔️ Approved by reviewer to 🎉 Done in FOC May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🎉 Done

Development

Successfully merging this pull request may close these issues.

5 participants